<!DOCTYPE html>
<html lang="en">
<head>
	<meta charset="UTF-8">
	<meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1">
	<title>Hunspell token filter | ElasticSearch 7.7 权威指南中文版</title>
	<meta name="keywords" content="ElasticSearch 权威指南中文版, elasticsearch 7, es7, 实时数据分析，实时数据检索" />
    <meta name="description" content="ElasticSearch 权威指南中文版, elasticsearch 7, es7, 实时数据分析，实时数据检索" />
    <!-- Give IE8 a fighting chance -->
    <!--[if lt IE 9]>
    <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
    <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
    <![endif]-->
	<link rel="stylesheet" type="text/css" href="../static/styles.css" />
	<script>
	var _link = 'analysis-hunspell-tokenfilter.html';
    </script>
</head>
<body>
<div class="main-container">
    <section id="content">
        <div class="content-wrapper">
            <section id="guide" lang="zh_cn">
                <div class="container">
                    <div class="row">
                        <div class="col-xs-12 col-sm-8 col-md-8 guide-section">
                            <div style="color:gray; word-break: break-all; font-size:12px;">原英文版地址: <a href="https://www.elastic.co/guide/en/elasticsearch/reference/7.7/analysis-hunspell-tokenfilter.html" rel="nofollow" target="_blank">https://www.elastic.co/guide/en/elasticsearch/reference/7.7/analysis-hunspell-tokenfilter.html</a>, 原文档版权归 www.elastic.co 所有<br/>本地英文版地址: <a href="../en/analysis-hunspell-tokenfilter.html" rel="nofollow" target="_blank">../en/analysis-hunspell-tokenfilter.html</a></div>
                        <!-- start body -->
                  <div class="page_header">
<strong>重要</strong>: 此版本不会发布额外的bug修复或文档更新。最新信息请参考 <a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html" rel="nofollow">当前版本文档</a>。
</div>
<div id="content">
<div class="breadcrumbs">
<span class="breadcrumb-link"><a href="index.html">Elasticsearch Guide [7.7]</a></span>
»
<span class="breadcrumb-link"><a href="analysis.html">Text analysis</a></span>
»
<span class="breadcrumb-link"><a href="analysis-tokenfilters.html">Token filter reference</a></span>
»
<span class="breadcrumb-node">Hunspell token filter</span>
</div>
<div class="navheader">
<span class="prev">
<a href="analysis-flatten-graph-tokenfilter.html">« Flatten graph token filter</a>
</span>
<span class="next">
<a href="analysis-hyp-decomp-tokenfilter.html">Hyphenation decompounder token filter »</a>
</span>
</div>
<div class="section">
<div class="titlepage"><div><div>
<h2 class="title">
<a id="analysis-hunspell-tokenfilter"></a>Hunspell token filter<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch/edit/7.7/docs/reference/analysis/tokenfilters/hunspell-tokenfilter.asciidoc">edit</a>
</h2>
</div></div></div>

<p>Provides <a class="xref" href="stemming.html#dictionary-stemmers" title="Dictionary stemmers">dictionary stemming</a> based on a provided
<a href="http://en.wikipedia.org/wiki/Hunspell" class="ulink" target="_top">Hunspell dictionary</a>. The <code class="literal">hunspell</code>
filter requires
<a class="xref" href="analysis-hunspell-tokenfilter.html#analysis-hunspell-tokenfilter-dictionary-config" title="Configure Hunspell dictionaries">configuration</a> of one or more
language-specific Hunspell dictionaries.</p>
<p>This filter uses Lucene’s
<a href="https://lucene.apache.org/core/8_5_1/analyzers-common/org/apache/lucene/analysis/hunspell/HunspellStemFilter.html" class="ulink" target="_top">HunspellStemFilter</a>.</p>
<div class="tip admon">
<div class="icon"></div>
<div class="admon_content">
<p>If available, we recommend trying an algorithmic stemmer for your language
before using the <a class="xref" href="analysis-hunspell-tokenfilter.html" title="Hunspell token filter"><code class="literal">hunspell</code></a> token filter.
In practice, algorithmic stemmers typically outperform dictionary stemmers.
See <a class="xref" href="stemming.html#dictionary-stemmers" title="Dictionary stemmers">Dictionary stemmers</a>.</p>
</div>
</div>
<div class="section">
<div class="titlepage"><div><div>
<h3 class="title">
<a id="analysis-hunspell-tokenfilter-dictionary-config"></a>Configure Hunspell dictionaries<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch/edit/7.7/docs/reference/analysis/tokenfilters/hunspell-tokenfilter.asciidoc">edit</a>
</h3>
</div></div></div>
<p>By default, Hunspell dictionaries are stored and detected on a dedicated
hunspell directory on the filesystem: <code class="literal">&lt;path.config&gt;/hunspell</code>. Each dictionary
is expected to have its own directory, named after its associated language and
locale (e.g., <code class="literal">pt_BR</code>, <code class="literal">en_GB</code>). This dictionary directory is expected to hold a
single <code class="literal">.aff</code> and one or more <code class="literal">.dic</code> files, all of which will automatically be
picked up. For example, assuming the default <code class="literal">&lt;path.config&gt;/hunspell</code> path
is used, the following directory layout will define the <code class="literal">en_US</code> dictionary:</p>
<div class="pre_wrapper lang-txt">
<pre class="programlisting prettyprint lang-txt">- config
    |-- hunspell
    |    |-- en_US
    |    |    |-- en_US.dic
    |    |    |-- en_US.aff</pre>
</div>
<p>Each dictionary can be configured with one setting:</p>
<div class="variablelist">
<a id="analysis-hunspell-ignore-case-settings"></a>
<dl class="variablelist">
<dt>
<span class="term">
<code class="literal">ignore_case</code>
</span>
</dt>
<dd>
(Static, boolean)
If true, dictionary matching will be case insensitive. Defaults to <code class="literal">false</code>.
</dd>
</dl>
</div>
<p>This setting can be configured globally in <code class="literal">elasticsearch.yml</code> using
<code class="literal">indices.analysis.hunspell.dictionary.ignore_case</code>.</p>
<p>To configure the setting for a specific locale, use the
<code class="literal">indices.analysis.hunspell.dictionary.&lt;locale&gt;.ignore_case</code> setting (e.g., for
the <code class="literal">en_US</code> (American English) locale, the setting is
<code class="literal">indices.analysis.hunspell.dictionary.en_US.ignore_case</code>).</p>
<p>It is also possible to add <code class="literal">settings.yml</code> file under the dictionary
directory which holds these settings. This overrides any other <code class="literal">ignore_case</code>
settings defined in <code class="literal">elasticsearch.yml</code>.</p>
</div>

<div class="section">
<div class="titlepage"><div><div>
<h3 class="title">
<a id="analysis-hunspell-tokenfilter-analyze-ex"></a>Example<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch/edit/7.7/docs/reference/analysis/tokenfilters/hunspell-tokenfilter.asciidoc">edit</a>
</h3>
</div></div></div>
<p>The following analyze API request uses the <code class="literal">hunspell</code> filter to stem
<code class="literal">the foxes jumping quickly</code> to <code class="literal">the fox jump quick</code>.</p>
<p>The request specifies the <code class="literal">en_US</code> locale, meaning that the
<code class="literal">.aff</code> and <code class="literal">.dic</code> files in the <code class="literal">&lt;path.config&gt;/hunspell/en_US</code> directory are used
for the Hunspell dictionary.</p>
<div class="pre_wrapper lang-console">
<pre class="programlisting prettyprint lang-console">GET /_analyze
{
  "tokenizer": "standard",
  "filter": [
    {
      "type": "hunspell",
      "locale": "en_US"
    }
  ],
  "text": "the foxes jumping quickly"
}</pre>
</div>
<div class="console_widget" data-snippet="snippets/931.console"></div>
<p>The filter produces the following tokens:</p>
<div class="pre_wrapper lang-text">
<pre class="programlisting prettyprint lang-text">[ the, fox, jump, quick ]</pre>
</div>
</div>

<div class="section">
<div class="titlepage"><div><div>
<h3 class="title">
<a id="analysis-hunspell-tokenfilter-configure-parms"></a>Configurable parameters<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch/edit/7.7/docs/reference/analysis/tokenfilters/hunspell-tokenfilter.asciidoc">edit</a>
</h3>
</div></div></div>
<div class="variablelist">
<a id="analysis-hunspell-tokenfilter-dictionary-param"></a>
<dl class="variablelist">
<dt>
<span class="term">
<code class="literal">dictionary</code>
</span>
</dt>
<dd>
<p>
(Optional, string or array of strings)
One or more <code class="literal">.dic</code> files (e.g, <code class="literal">en_US.dic, my_custom.dic</code>) to use for the
Hunspell dictionary.
</p>
<p>By default, the <code class="literal">hunspell</code> filter uses all <code class="literal">.dic</code> files in the
<code class="literal">&lt;path.config&gt;/hunspell/&lt;locale&gt;</code> directory specified specified using the
<code class="literal">lang</code>, <code class="literal">language</code>, or <code class="literal">locale</code> parameter. To use another directory, the
directory’s path must be registered using the
<a class="xref" href="analysis-hunspell-tokenfilter.html#indices-analysis-hunspell-dictionary-location"><code class="literal">indices.analysis.hunspell.dictionary.location</code></a> setting.</p>
</dd>
<dt>
<span class="term">
<code class="literal">dedup</code>
</span>
</dt>
<dd>
(Optional, boolean)
If <code class="literal">true</code>, duplicate tokens are removed from the filter’s output. Defaults to
<code class="literal">true</code>.
</dd>
<dt>
<span class="term">
<code class="literal">lang</code>
</span>
</dt>
<dd>
<p>
(Required*, string)
An alias for the <a class="xref" href="analysis-hunspell-tokenfilter.html#analysis-hunspell-tokenfilter-locale-param"><code class="literal">locale</code>
parameter</a>.
</p>
<p>If this parameter is not specified, the <code class="literal">language</code> or <code class="literal">locale</code> parameter is
required.</p>
</dd>
<dt>
<span class="term">
<code class="literal">language</code>
</span>
</dt>
<dd>
<p>
(Required*, string)
An alias for the <a class="xref" href="analysis-hunspell-tokenfilter.html#analysis-hunspell-tokenfilter-locale-param"><code class="literal">locale</code>
parameter</a>.
</p>
<p>If this parameter is not specified, the <code class="literal">lang</code> or <code class="literal">locale</code> parameter is
required.</p>
</dd>
</dl>
</div>
<div class="variablelist">
<a id="analysis-hunspell-tokenfilter-locale-param"></a>
<dl class="variablelist">
<dt>
<span class="term">
<code class="literal">locale</code>
</span>
</dt>
<dd>
<p>
(Required*, string)
Locale directory used to specify the <code class="literal">.aff</code> and <code class="literal">.dic</code> files for a Hunspell
dictionary. See <a class="xref" href="analysis-hunspell-tokenfilter.html#analysis-hunspell-tokenfilter-dictionary-config" title="Configure Hunspell dictionaries">Configure Hunspell dictionaries</a>.
</p>
<p>If this parameter is not specified, the <code class="literal">lang</code> or <code class="literal">language</code> parameter is
required.</p>
</dd>
<dt>
<span class="term">
<code class="literal">longest_only</code>
</span>
</dt>
<dd>
(Optional, boolean)
If <code class="literal">true</code>, only the longest stemmed version of each token is
included in the output. If <code class="literal">false</code>, all stemmed versions of the token are
included. Defaults to <code class="literal">false</code>.
</dd>
</dl>
</div>
</div>

<div class="section">
<div class="titlepage"><div><div>
<h3 class="title">
<a id="analysis-hunspell-tokenfilter-analyzer-ex"></a>Customize and add to an analyzer<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch/edit/7.7/docs/reference/analysis/tokenfilters/hunspell-tokenfilter.asciidoc">edit</a>
</h3>
</div></div></div>
<p>To customize the <code class="literal">hunspell</code> filter, duplicate it to create the
basis for a new custom token filter. You can modify the filter using its
configurable parameters.</p>
<p>For example, the following <a class="xref" href="indices-create-index.html" title="Create index API">create index API</a> request
uses a custom <code class="literal">hunspell</code> filter, <code class="literal">my_en_US_dict_stemmer</code>, to configure a new
<a class="xref" href="analysis-custom-analyzer.html" title="Create a custom analyzer">custom analyzer</a>.</p>
<p>The <code class="literal">my_en_US_dict_stemmer</code> filter uses a <code class="literal">locale</code> of <code class="literal">en_US</code>, meaning that the
<code class="literal">.aff</code> and <code class="literal">.dic</code> files in the <code class="literal">&lt;path.config&gt;/hunspell/en_US</code> directory are
used. The filter also includes a <code class="literal">dedup</code> argument of <code class="literal">false</code>, meaning that
duplicate tokens added from the dictionary are not removed from the filter’s
output.</p>
<div class="pre_wrapper lang-console">
<pre class="programlisting prettyprint lang-console">PUT /my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "en": {
          "tokenizer": "standard",
          "filter": [ "my_en_US_dict_stemmer" ]
        }
      },
      "filter": {
        "my_en_US_dict_stemmer": {
          "type": "hunspell",
          "locale": "en_US",
          "dedup": false
        }
      }
    }
  }
}</pre>
</div>
<div class="console_widget" data-snippet="snippets/932.console"></div>
</div>

<div class="section">
<div class="titlepage"><div><div>
<h3 class="title">
<a id="analysis-hunspell-tokenfilter-settings"></a>Settings<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch/edit/7.7/docs/reference/analysis/tokenfilters/hunspell-tokenfilter.asciidoc">edit</a>
</h3>
</div></div></div>
<p>In addition to the <a class="xref" href="analysis-hunspell-tokenfilter.html#analysis-hunspell-ignore-case-settings"><code class="literal">ignore_case</code>
settings</a>, you can configure the following global settings for the <code class="literal">hunspell</code>
filter using <code class="literal">elasticsearch.yml</code>:</p>
<div class="variablelist">
<dl class="variablelist">
<dt>
<span class="term">
<code class="literal">indices.analysis.hunspell.dictionary.lazy</code>
</span>
</dt>
<dd>
(Static, boolean)
If <code class="literal">true</code>, the loading of Hunspell dictionaries is deferred until a dictionary
is used. If <code class="literal">false</code>, the dictionary directory is checked for dictionaries when
the node starts, and any dictionaries are automatically loaded. Defaults to
<code class="literal">false</code>.
</dd>
</dl>
</div>
<div class="variablelist">
<a id="indices-analysis-hunspell-dictionary-location"></a>
<dl class="variablelist">
<dt>
<span class="term">
<code class="literal">indices.analysis.hunspell.dictionary.location</code>
</span>
</dt>
<dd>
<p>
(Static, string)
Path to a Hunspell dictionary directory. This path must be absolute or
relative to the <code class="literal">config</code> location.
</p>
<p>By default, the <code class="literal">&lt;path.config&gt;/hunspell</code> directory is used, as described in
<a class="xref" href="analysis-hunspell-tokenfilter.html#analysis-hunspell-tokenfilter-dictionary-config" title="Configure Hunspell dictionaries">Configure Hunspell dictionaries</a>.</p>
</dd>
</dl>
</div>
</div>

</div>
<div class="navfooter">
<span class="prev">
<a href="analysis-flatten-graph-tokenfilter.html">« Flatten graph token filter</a>
</span>
<span class="next">
<a href="analysis-hyp-decomp-tokenfilter.html">Hyphenation decompounder token filter »</a>
</span>
</div>
</div>

                  <!-- end body -->
                        </div>
                        <div class="col-xs-12 col-sm-4 col-md-4" id="right_col">
                        
                        </div>
                    </div>
                </div>
            </section>
        </div>
    </section>
</div>
<script src="../static/cn.js"></script>
</body>
</html>